Interpretable representation learning for 3D multi-piece intracellular structures using point clouds

A key challenge in understanding subcellular organization is quantifying interpretable measurements of intracellular structures with complex multi-piece morphologies in an objective, robust and generalizable manner. Here we introduce a morphology-appropriate representation learning framework that uses 3D rotation invariant autoencoders and point clouds. This framework is used to learn representations of complex multi-piece morphologies that are independent of orientation, compact, and easy to interpret. We apply our framework to intracellular structures with punctate morphologies (e.g. DNA replication foci) and polymorphic morphologies (e.g. nucleoli). We systematically compare our framework to image-based autoencoders across several intracellular structure datasets, including a synthetic dataset with pre-defined rules of organization. We explore the trade-offs in the performance of different models by performing multi-metric benchmarking across efficiency, generative capability, and representation expressivity metrics. We find that our framework, which embraces the underlying morphology of multi-piece structures, facilitates the unsupervised discovery of sub-clusters for each structure. We show how our approach can also be applied to phenotypic profiling using a dataset of nucleolar images following drug perturbations. We implement and provide all representation learning models using CytoDL, a python package for flexible and configurable deep learning experiments.

for interpolation distance and evolution energy calculation.Two samples are drawn from the population randomly, and a linear interpolation is performed on the representations of the two samples.The euclidean distance between an interpolation and the nearest real representation is the interpolation distance.The interpolation distance is averaged across many interpolations to compute the average interpolation distance.Each interpolation is reconstructed using the decoder to obtain a reconstruction.
The sum of the reconstruction error between the interpolated reconstruction and the reconstructions of the initial and final shapes normalized by the reconstruction error between the initial and final shape is the energy of deformation 7 .The energy of deformation is averaged across many interpolations to compute the evolution energy.Both evolution energy and average interpolation distance are averaged across many random pairs of samples from the test set.c) Holistic evaluation of metrics.Metrics are z-scored across models per metric.Z-scored metrics are visualized using a polar plot by flipping the sign for metrics where lower is better (indicated by a negative sign).

Figure S1 -
Figure S1 -Evaluation metrics for representation learning models a) Overview of different evaluation metrics for quantifying the utility of each representation learning framework.Efficiency metrics include model size, inference time, and carbon emissions.Generative ability metrics include reconstruction error and evolution energy.Representation expressivity metrics include rotation invariance error, interpolation distance, feature regression, classification accuracy, and compactness.b) Workflow

Figure S2 -
Figure S2 -Testing orientation invariance for image and point cloud models for the cellPACK synthetic dataset a) (Top row) Example image input for the planar 45 rule is rotated by four 90 degree rotations.(Middle row) Reconstructions using the classical image model (upper) and rotation invariant image model (lower) for each rotated input.The reconstructions using the rotation invariant model are pose-corrected using the learned rotation angles.(Bottom row) Rotation invariant reconstructions using the rotation invariant image model for each rotated input.b) (Top row) Example point cloud input for the planar 45 rule is rotated by four 90 degree rotations.(Middle row) Reconstructions using the classical point cloud model (upper) and rotation invariant point cloud model (lower) for each rotated input.The reconstructions using the rotation invariant model are pose-corrected using the learned rotation angles.(Bottom row) Rotation invariant reconstructions using the rotation invariant point cloud model for each rotated input.All reconstructions shown are max projections in Z.

Figure S5 -
Figure S5 -3D image preprocessing into application appropriate inputs for punctate structures.Workflow for generating 4D point clouds from 3D intensity images.a) single-cell intensity images are obtained by masking via a dilated nuclear mask (for nuclear structures), followed by alignment to the longest axis of the nuclear mask.Intensities were then scaled using an exponential function and then converted to probabilities.These probabilities were then used to sample a dense 4D point cloud with 20480 points and XYZ + intensity coordinates.During training, a sparse point cloud with 2048 points was sampled from this dense point cloud using the intensities as probabilities.The intensity coordinate was scaled using a scale factor of 0.1 to ensure that intensity values were in the same range as XYZ coordinate values.b) Examples of dense sample and sparse sample for each cell cycle stage for PCNA dataset.Shown are center-slice of raw intensity image, center-slice of raw intensity image overlaid with dense sample, and center-slice of raw intensity image overlaid with sparse training sample.c) Examples of dense sample and sparse sample for each punctate structure from the WTC-11 hiPSC Single-Cell Image

Figure S6 -
Figure S6 -Evaluation of test set model reconstructions for the DNA replication foci dataset.Test set center slice inputs (a, e, g) and reconstructions using b) classical image model, c) rotation invariant image model, d) an alternative classical image model via a masked autoencoder with a vision transformer as an encoder (MAE-ViT), f) classical point cloud model, and h) rotation invariant point cloud model for samples from each of the 8 cell cycle stages.Both pose-corrected and rotation invariant reconstructions are shown for the rotation invariant models.

Figure S8 -
Figure S8-3D image preprocessing into application appropriate inputs for polymorphic structures a) Workflow for computing signed distance function (SDF) images from segmentations.single-cell structure segmentations are masked by nuclear segmentation (for nuclear structures), followed by meshing, centering, and hole filling.The mesh is then rescaled to 32* 3 cube resolution and then processed to get a signed distance function.Alternatively, the rescaled mesh is voxelized to get a segmentation.SDF is clipped to (-2, 2) range for training image models to focus models on the zero level set.Example shown is for nucleoli (GC).b).Visualization of rescaled segmentation and SDF for examples with different numbers of pieces of granular component (GC) of nucleoli.Shown are center-slices of raw intensity images, max projection of the structure segmentation, max projection of the voxelized rescaled segmentation, and center slice of the rescaled mesh SDF.c) Visualization of rescaled segmentation and SDF for other polymorphic structures from the WTC-11 hiPSC Single-Cell Image Dataset v1 including lysosomes, Golgi, GC nucleoli, and dense fibrillar component (DFC) nucleoli.